02. Lesson Outline

Lesson Outline

Data wrangling process:

  • Gather
  • Assess (this lesson)
  • Clean

Assessing your data is the second step in data wrangling. When assessing, you're like a detective at work, inspecting your dataset for two things: data quality issues ( i.e. content issues ) and lack of tidiness ( i.e. structural issues ).

Assessing is the precursor to cleaning. You can't clean something that you don't know exists! In this lesson, you'll learn to identify and categorize common data quality and tidiness issues. This lesson is the shortest and most "hands-off" code-wise of all four in the course because of the passive nature of assessing relative to gathering and cleaning. We have tried to include quizzes wherever possible.

This lesson will be structured as follows:

  • You'll get motivated to assess (and later clean) the dataset for lessons 3 and 4: Phase II clinical trial data that compares the efficacy and safety of a new oral insulin to treat diabetes
  • You'll learn to distinguish between dirty data and messy data
  • You'll assess the data visually and programmatically to identify:
    • Data quality issues
    • Tidiness issues
  • You'll learn about data quality dimensions and categorize each of the data quality issues identified above into its appropriate dimension

To begin, I want to introduce you to the dataset you will be assessing in this lesson.